Efficient approximations for learning phylogenetic HMM models from data

نویسندگان

  • Vladimir Jojic
  • Nebojsa Jojic
  • Christopher Meek
  • Dan Geiger
  • Adam C. Siepel
  • David Haussler
  • David Heckerman
چکیده

MOTIVATION We consider models useful for learning an evolutionary or phylogenetic tree from data consisting of DNA sequences corresponding to the leaves of the tree. In particular, we consider a general probabilistic model described in Siepel and Haussler that we call the phylogenetic-HMM model which generalizes the classical probabilistic models of Neyman and Felsenstein. Unfortunately, computing the likelihood of phylogenetic-HMM models is intractable. We consider several approximations for computing the likelihood of such models including an approximation introduced in Siepel and Haussler, loopy belief propagation and several variational methods. RESULTS We demonstrate that, unlike the other approximations, variational methods are accurate and are guaranteed to lower bound the likelihood. In addition, we identify a particular variational approximation to be best-one in which the posterior distribution is variationally approximated using the classic Neyman-Felsenstein model. The application of our best approximation to data from the cystic fibrosis transmembrane conductance regulator gene region across nine eutherian mammals reveals a CpG effect.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian Online Algorithms for Learning in Discrete Hidden Markov Models

We propose and analyze two different Bayesian online algorithms for learning in discrete Hidden Markov Models and compare their performance with the already known Baldi-Chauvin Algorithm. Using the Kullback-Leibler divergence as a measure of generalization we draw learning curves in simplified situations for these algorithms and compare their performances. 1. Introduction. The unifying perspect...

متن کامل

Novel Phylogenetic Network Inference by Combining Maximum Likelihood and Hidden Markov Models

Horizontal Gene Transfer (HGT) is the event of transferring genetic material from one lineage in the evolutionary tree to a different lineage. HGTplays amajor role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Although the prevailing assumption is of complete HGT, cases of partial HGT (which are also named chimeric HGT) w...

متن کامل

Stochastic Variational Inference for the HDP-HMM

We derive a variational inference algorithm for the HDP-HMM based on the two-level stick breaking construction. This construction has previously been applied to the hierarchical Dirichlet processes (HDP) for mixed membership models, allowing for efficient handling of the coupled weight parameters. However, the same algorithm is not directly applicable to HDP-based infinite hidden Markov models ...

متن کامل

Efficient Learning of Continuous-Time Hidden Markov Models for Disease Progression

The Continuous-Time Hidden Markov Model (CT-HMM) is an attractive approach to modeling disease progression due to its ability to describe noisy observations arriving irregularly in time. However, the lack of an efficient parameter learning algorithm for CT-HMM restricts its use to very small models or requires unrealistic constraints on the state transitions. In this paper, we present the first...

متن کامل

Using Hidden Markov Models to Evaluate the Quality of Discovered Process Models

Hidden Markov Models (HMMs) are a stochastic signal modeling formalism that is actively used in the machine learning community for a wide range of applications such as speech and activity recognition. Efficient techniques exist to learn HMM models from a given data set, and to estimate the data likelihood with respect to a given HMM (i.e., “How probable is it that these data were produced by th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 20 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2004